On-line construction of compact suffix vectors and maximal repeats
نویسندگان
چکیده
A suffix vector of a string is an index data structure equivalent to a suffix tree. It was first introduced by Monostori et al. in 2001 [12,13,14]. They proposed a linear construction algorithm of an extended suffix vector, then another linear algorithm to transform an extended suffix vector into a more space economical compact suffix vector. We propose an on-line linear algorithm for directly constructing a compact suffix vector. Not only do we show that it is possible to directly build a compact suffix vector but we will also show that this on-line construction can be faster than the construction of the extended suffix vector. Then we formalize the relation between suffix vectors and compact suffix automata which leads to an efficient method for computing maximal repeats using suffix vectors.
منابع مشابه
Efficient repeat finding via suffix arrays
We solve the problem of finding interspersed maximal repeats using a suffix array construction. As it is well known, all the functionality of suffix trees can be handled by suffix arrays, gaining practicality. Our solution improves the suffix tree based approaches for the repeat finding problem, being particularly well suited for very large inputs. We prove the corrrectness and complexity of th...
متن کاملCompact Suffix Trees Resemble PATRICIA Tries: Limiting Distribution of the Depth
Suffix trees are the most frequently used data structures in algorithms on words. In this paper, we consider the depth of a compact suffix tree, also known as the PAT tree, under some simple probabilistic assumptions. For a biased memoryless source, we prove that the limiting distribution for the depth in a PAT tree is the same as the limiting distribution for the depth in a PATRICIA trie, even...
متن کاملComposite Repetition-Aware Data Structures
In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time ...
متن کاملSuffix Tree
SYNONYMS Compact suffix trie DEFINITION The suffix tree S(y) of a non-empty string y of length n is a compact trie representing all the suffixes of the string. The suffix tree of y is defined by the following properties: All branches of S(y) are labeled by all suffixes of y. • • Edges of S(y) are labeled by strings. • Internal nodes of S(y) have at least two children. • Edges outgoing an intern...
متن کاملtitle : Finding Maximal Repeats with Factor Oracles
Factor oracles, built from an input text, are automata similar to suffix automata, and accepting at least all substrings of the input text. In papers [LL00] and [LLA02], factor oracles are used to detect repeats on text. Although repeats found with these methods are not maximal, average error is very low and algorithm runs quite fast. In this paper, we present two ideas to improve accuracy of t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Theor. Comput. Sci.
دوره 407 شماره
صفحات -
تاریخ انتشار 2008